Bounded Parameter Markov Decision Processes with Average Reward Criterion
نویسندگان
چکیده
Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with uncertainty in the parameters of a Markov Decision Process (MDP). Unlike the case of an MDP, the notion of an optimal policy for a BMDP is not entirely straightforward. We consider two notions of optimality based on optimistic and pessimistic criteria. These have been analyzed for discounted BMDPs. Here we provide results for average reward BMDPs. We establish a fundamental relationship between the discounted and the average reward problems, prove the existence of Blackwell optimal policies and, for both notions of optimality, derive algorithms that converge to the optimal value function.
منابع مشابه
Semi-markov Decision Processes
Considered are infinite horizon semi-Markov decision processes (SMDPs) with finite state and action spaces. Total expected discounted reward and long-run average expected reward optimality criteria are reviewed. Solution methodology for each criterion is given, constraints and variance sensitivity are also discussed.
متن کاملMulti-Criteria Approaches to Markov Decision Processes with Uncertain Transition Parameters
Markov decision processes (MDPs) are a well established model for planing under uncertainty. In most situations the MDP parameters are estimates from real observations such that their values are not known precisely. Different types of MDPs with uncertain, imprecise or bounded transition rates or probabilities and rewards exist in the literature. Commonly the resulting processes are optimized wi...
متن کاملReachability analysis of uncertain systems using bounded-parameter Markov decision processes
Verification of reachability properties for probabilistic systems is usually based on variants of Markov processes. Current methods assume an exact model of the dynamic behavior and are not suitable for realistic systems that operate in the presence of uncertainty and variability. This research note extends existing methods for Bounded-parameter Markov Decision Processes (BMDPs) to solve the re...
متن کاملA Probabilistic Analysis of Bias Optimality in Unichain Markov Decision Processes y
Since the long-run average reward optimality criterion is underselective, a decisionmaker often uses bias to distinguish between multiple average optimal policies. We study bias optimality in unichain, nite state and action space Markov Decision Processes. A probabilistic approach is used to give intuition as to why a bias-based decision-maker prefers a particular policy over another. Using rel...
متن کاملAverage-Reward Decentralized Markov Decision Processes
Formal analysis of decentralized decision making has become a thriving research area in recent years, producing a number of multi-agent extensions of Markov decision processes. While much of the work has focused on optimizing discounted cumulative reward, optimizing average reward is sometimes a more suitable criterion. We formalize a class of such problems and analyze its characteristics, show...
متن کامل